117 research outputs found
Analyzing the Amazon Mechanical Turk Marketplace
Since the concept of crowdsourcing is relatively new, many potential
participants have questions about the AMT marketplace. For example, a
common set of questions that pop up in an 'introduction to crowdsourcing
and AMT' session are the following: What type of tasks can be completed
in the marketplace? How much does it cost? How fast can I get results
back? How big is the AMT marketplace? The answers for these questions
remain largely anecdotal and based on personal observations and
experiences. To understand better what types of tasks are being
completed today using crowdsourcing techniques, we started collecting
data about the AMT marketplace. We present a preliminary analysis of the
dataset and provide directions for interesting future research
Modeling Dependency in Prediction Markets
In the last decade, prediction markets became popular forecasting tools
in areas ranging from election results to movie revenues and Oscar
nominations. One of the features that make prediction markets
particularly attractive for decision support applications is that they
can be used to answer what-if questions and estimate probabilities of
complex events. Traditional approach to answering such questions
involves running a combinatorial prediction market, what is not always
possible. In this paper, we present an alternative, statistical approach
to pricing complex claims, which is based on analyzing co-movements of
prediction market prices for basis events. Experimental evaluation of
our technique on a collection of 51 InTrade contracts representing the
Democratic Party Nominee winning Electoral College Votes of a particular
state shows that the approach outperforms traditional forecasting
methods such as price and return regressions and can be used to extract
meaningful business intelligence from raw price data
Recommended from our members
Summarizing and Searching Hidden-Web Databases Hierarchically Using Focused Probes
Many valuable text databases on the web have non-crawlable contents that are "hidden" behind search interfaces. Metasearchers are helpful tools for searching over many such databases at once through a unified query interface. A critical task for a metasearcher to process a query efficiently and effectively is the selection of the most promising databases for the query, a task that typically relies on statistical summaries of the database contents. Unfortunately, web-accessible text databases do not generally export content summaries. In this paper, we present an algorithm to derive content summaries from "uncooperative" databases by using "focused query probes," which adaptively zoom in on and extract documents that are representative of the topic coverage of the databases. The content summaries that result from this algorithm are efficient to derive and more accurate than those from previously proposed probing techniques for content-summary extraction. We also present a novel database selection algorithm that exploits both the extracted content summaries and a hierarchical classification of the databases, automatically derived during probing, to produce accurate results even for imperfect content summaries. Finally, we evaluate our techniques thoroughly using a variety of databases, including 50 real web-accessible text databases
Modeling Volatility in Prediction Markets
Nowadays, there is a significant experimental evidence of excellent ex-post predictive accuracy in certain types of prediction markets, such as markets for elections. This evidence shows that prediction markets are efficient mechanisms for aggregating information and are more accurate in forecasting events than traditional forecasting methods, such as polls. Interpretation of prediction market prices as probabilities has been extensively studied in the literature, however little attention so far has been given to understanding volatility of prediction market prices. In this paper, we present a model of a prediction market with a binary payoff on a competitive event involving two parties. In our model, each party has some underlying ``ability'' process that describes its ability to win and evolves as an Ito diffusion. We show that if the prediction market for this event is efficient and accurate, the price of the corresponding contract will also follow a diffusion and its instantaneous volatility is a particular function of the current claim price and its time to expiration. We generalize our results to competitive events involving more than two parties and show that volatilities of prediction market contracts for such events are again functions of the current claim prices and the time to expiration, as well as of several additional parameters (ternary correlations of the underlying Brownian motions). In the experimental section, we validate our model on a set of InTrade prediction markets and show that it is consistent with observed volatilities of contract returns and outperforms the well-known GARCH model in predicting future contract volatility from historical price data. To demonstrate the practical value of our model, we apply it to pricing options on prediction market contracts, such as those recently introduced by InTrade. Other potential applications of this model include detection of significant market moves and improving forecast standard errors
Estimating the Socio-Economic Impact of Product Reviews: Mining Text and Reviewer Characteristics
With the rapid growth of the Internet, the ability of users to create and publish content has created active electronic communities that provide a wealth of product information. However, the high volume of reviews that are typically published for a single product makes harder for individuals as well as manufacturers to locate the best reviews and understand the true underlying quality of a product. In this paper, we re-examine the impact of reviews on economic outcomes like product sales and see how different factors affect social outcomes like the extent of their perceived usefulness. Our approach explores multiple aspects of review text, such as lexical, grammatical, semantic, and stylistic levels to identify important text-based features. In addition, we also examine multiple reviewer-level features such as average usefulness of past reviews and the self-disclosed identity measures of reviewers that are displayed next to a review. Our econometric analysis reveals that the extent of subjectivity, informativeness, readability, and linguistic correctness in reviews matters in influencing sales and perceived usefulness. Reviews that have a mixture of objective, and highly subjective sentences have a negative effect on product sales, compared to reviews that tend to include only subjective or only objective information. However, such reviews are considered more informative (or helpful) by the users. By using Random Forest based classifiers, we show that we can accurately predict the impact of reviews on sales and their perceived usefulness. Reviews for products that have received widely fluctuating reviews, also have reviews of widely fluctuating helpfulness. In particular, we find that highly detailed and readable reviews can have low helpfulness votes in cases when users tend to vote negatively not because they disapprove of the review quality but rather to convey their disapproval of the review polarity. We examine the relative importance of the three broad feature categories: `reviewer-related' features, `review subjectivity' features, and `review readability' features, and find that using any of the three feature sets results in a statistically equivalent performance as in the case of using all available features. This paper is the first study that integrates econometric, text mining, and predictive modeling techniques toward a more complete analysis of the information captured by user-generated online reviews in order to estimate their socio-economic impact. Our results can have implications for judicious design of opinion forums
Demographics of Mechanical Turk
We present the results of a survey that collected information about the
demographics of participants on Amazon Mechanical Turk, together with
information about their level of activity and motivation for working on
Amazon Mechanical Turk. We find that approximately 50% of the workers
come from the United States and 40% come from India. Country of origin
tends to change the motivating reasons for workers to participate in the
marketplace. Significantly more workers from India participate on
Mechanical Turk because the online marketplace is a primary source of
income, while in the US most workers consider Mechanical Turk a
secondary source of income. While money is a primary motivating reason
for workers to participate in the marketplace, workers also cite a
variety of other motivating reasons, including entertainment and education
Analyzing the Amazon Mechanical Turk Marketplace
Since the concept of crowdsourcing is relatively new, many potential
participants have questions about the AMT marketplace. For example, a
common set of questions that pop up in an 'introduction to crowdsourcing
and AMT' session are the following: What type of tasks can be completed
in the marketplace? How much does it cost? How fast can I get results
back? How big is the AMT marketplace? The answers for these questions
remain largely anecdotal and based on personal observations and
experiences. To understand better what types of tasks are being
completed today using crowdsourcing techniques, we started collecting
data about the AMT marketplace. We present a preliminary analysis of the
dataset and provide directions for interesting future research
Recommended from our members
QProber: A System for Automatic Classification of Hidden-Web Resources
The contents of many valuable web-accessible databases are only available through search interfaces and are hence invisible to traditional web "crawlers." Recently, commercial web sites have started to manually organize web-accessible databases into Yahoo!-like hierarchical classification schemes. Here, we introduce QProber, a modular system that automates this classification process by using a small number of query probes, generated by document classifiers. QProber can use a variety of types of classifiers to generate the probes. To classify a database, QProber does not retrieve or inspect any documents or pages from the database, but rather just exploits the number of matches that each query probe generates at the database in question. We have conducted an extensive experimental evaluation of QProber over collections of real documents, experimenting with different types of document classifiers and retrieval models. We have also tested our system with over one hundred web-accessible databases. Our experiments show that our system has low overhead and achieves high classification accuracy across a variety of databases
The Dimensions of Reputation in Electronic Markets
We present a framework for identifying the different dimensions of online reputation and characterizing
their influence on the pricing power of sellers. Our theory predicts that sellers with better recorded online
reputation can successfully charge higher prices than competing sellers of identical products, and that their
pricing power increases with their recorded level of experience. We develop and implement a new text mining
technique that identities and quantitatively assesses dimensions of importance in reputation profiles, and use
this technique to create a new data set containing detailed reputation profiles and prices for sellers in over
9,500 transactions for consumer software on Amazon.com's online secondary marketplace. The estimation
of a set of econometric models on this data set validates the predictions of our theory, and further, ranks
these dimensions of reputation based on their effect on measured seller value, identifying those that have
the most significant impact on reputation. This paper is the first study that integrates econometric and text
mining techniques toward a more complete analysis of the information captured by reputation systems, and
it presents new evidence of the importance of their effective and judicious design.Information Systems Working Papers Serie
Relevance-based Retrieval on Hidden-Web Text Databases without Ranking Support
Many online or local data sources provide powerful querying mechanisms
but limited ranking capabilities. For instance, PubMed allows users to
submit highly expressive Boolean keyword queries, but ranks the query
results by date only. However, a user would typically prefer a ranking
by relevance, measured by an Information Retrieval (IR) ranking
function. The naive approach would be to submit a disjunctive query with
all query keywords, retrieve the returned documents, and then re-rank
them. Unfortunately, such an operation would be very expensive due to
the large number of results returned by disjunctive queries. In this
paper we present algorithms that return the top results for a query,
ranked according to an IR-style ranking function, while operating on top
of a source with a Boolean query interface with no ranking capabilities
(or a ranking capability of no interest to the end user). The algorithms
generate a series of conjunctive queries that return only documents that
are candidates for being highly ranked according to a relevance metric.
Our approach can also be applied to other settings where the ranking is
monotonic on a set of factors (query keywords in IR) and the source
query interface is a Boolean expression of these factors. Our
comprehensive experimental evaluation on the PubMed database and a TREC
dataset show that we achieve order of magnitude improvement compared to
the current baseline approaches.Vagelis Hristidis was partly supported by NSF grant IIS-0811922 and DHS
grant 2009-ST-062-000016. Panagiotis G.\ Ipeirotis was supported by the
National Science Foundation under Grant No. IIS-0643846
- …